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Abstract 

We give an algorithm, with R code, to minimize the multidimensional scaling stress 
loss function under the condition that some or all of the fitted distances are between 
given positive upper and lower bounds. This paper combines theory, algorithms, code, 
and results of De Leeuw (2017b) and De Leeuw (2017a). 
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1 Introduction 

In this paper we study the minimization of stress, defined as 

1 n n 

stress(X) := - - dp-(X)) 2 

i= i i=i 

over n x p configuration matrices X. Here W = {try,} and A = {hp} are given symmetric, 
hollow, and non-negative matrices of, respectively, weights and dissimilarities. The difiX) 
are Euclidean distances between the points with coordinates in the rows of X. Thus 

dij(X) := (xi - Xj)'(xi - xj) = (e; - efi)'XX\ei - efi) = tr X'AijX. 

Here e* and e 3 are unit vectors, with all elements equal to zero except for the single element i 
or j, which is equal to one. Also A VJ := (e* — efilpi — efi)'. 

The problem of minimizing stress is well-studied (see, for example, Borg and Groenen (2005)). 
In this paper we analyze the problem of minimizing stress over all configurations X for 
which aij < d t] (X) < for some or all pairs i , j, where the a t j and /% are given bounds 
satisfying 0 < < fiij < +oo. We suppose the constraints are strongly consistent, i.e. there 

is at least one n x p configuration matrix A" such that < difiX) < fiij. 

These optimization problems are non-standard in various ways. Although the set of X such 
that dij(X) < fiij is convex, the constraints dij(X) > a tJ define a reverse convex set (Meyer 
(1970)) in configuration space. The loss funcion stress is neither convex nor concave. We can, 
however, use basic smacof theory (De Leeuw (1977)) to majorize stress by a convex quadratic 
and to majorize the constraints by linear or convex quadratic functions. Majorization can 
be used to define a simple iterative algorithm. In each iteration we majorize stress and the 
constraints, and then minimize the convex quadratic majorizer over configurations satisfying 
the majorized constraints using quadratic programming with quadratic constraints or QPQC 
(De Leeuw (2017c)). Compute a new majorization at the minimizer of the majorized problem, 
and so on. These are the two steps in the majorization or MM approach to optimization (De 
Leeuw (1994), Lange (2016)). 

1.1 Simplifying stress 

A simplified expression for stress, first used in De Leeuw (1993), is 

1 k - 

stress(x) = ~^2w k (S k - \jx'A k x ) 2 . 

^ fc=i 

We have put the elements below the diagonal of W and A in a vector of length ^n(n — 1). 
Also x := vec(A") and the A k are the direct sums of p copies of the corresponding A^. Now 
expand the square, and assume without loss of generality that \ J2 k =i w kd k = 1. Then 

K r - 1 

stress(x) = 1 — 5Z w kbk\!x'A k x + -x'Vx, 
k =i ^ 


2 



with V := Zk=i w k A k . 

Suppose V = KA?K' is a complete eigen-decomposition of V. Some care is needed to handle 
the fact that V is singular, but under the assumption of irreducibility (De Leeuw (1977)) 
this is easily taken care of. Change variables with z = A 2 K'x. Then 


stress(o) 
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with A k = A zK'AkKA 2 , so that Yl, k =\ w k A k = /. 


1.2 Smacof Notation and Theory 

We have see in the previous section that we can write 

stress(o;) = 1 — p(x) + -a/a;, 


where 


I\ 

p{x) := w k 8 k d k (x), 
k =1 


with distances d k (x) := \Jx'A k x, and the A k positive semi-dehnite matrices that add up to 
the identity. By Cauchy-Schwarz, 


p(x) = x'B(x)x > x'B(y)y, 


where 


K 


8 k 


B{x) : = Y, w k-rTA A 


k - 


d k( x Y 

If we define the Guttman transform, if x as T(x) := B(x)x , then for all x 

(. stress)(x ) = 1 — x'r(x) + ^ x'x = (1 — ^T(a;)T(a;)) + ^(a; — T(a;)) / (a; — T(x)), 
and for all x, y 

(stress) < 1 - x'T(y) + ^x'x = (1 - ^T(y)'T(y)) + ^(x - r(y)'(a; - T(j/)), 

In this notation a smacof iteration is a/ fc+ b — r(a/ fc )). 


2 Some Majorization Theory 

Suppose the problem we are interested in is to minimize a real-valued / 0 over all x G M n that 
satisfy fj(x) < 0 for all j — 1, • • • , m. We call this problem V. 

Now suppose gj : M n <S> M n —> M majorizes fj, i.e. 
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fj(x ) < 9j{x,y) for all x,y e M n , 
fj(y ) = 9j{y,y) for a11 y e 

To simplify matters we suppose all fj and g 3 are continuously differentiable, and all g 3 are 
convex in their first argument. Define problem Vk as the problem of minimizing g 0 (x,x^ k ~^) 
over the x G M n that satisfy gj(x,x^ k ~^) < 0. Dehne a sequence x^ with a^ 0 ) feasible for V, 
and with x^ for k > 1 a solution of Vk- 

Theorem 1: [Convergence] 

1. x ^ is feasible for V for all k > 1. 

2. / 0 (aA fc )) < f 0 (x for all k > 1. 

Proof: For the first part fj(x^) < gj{x^ k \x^ k ~ "')) by majorization, and gj{x^ k \x^ k ' _1) ) — o 
by feasibility for Vk- For the second part fo(x^) < go(x^ k \x^ k ^) by majorization. Because 
x^ solves Vk we have g 0 (x^ k \ x^ k ~^) < g 0 (x ( ' k ^ 1 \x^ k ~ 1 ^) if x^ k ~^ is feasible for Vk- i.e. if 
gj{x^ k ~ l \x^ k ~ 1 " > ) < 0. But gj{x^ k ~ l \ x^ k ~ 1 ^) = fj(x^ k ~^) < 0 by the hrst part of the theorem. 

QED 

2.1 Fixed Points 

By theorem 1 we have a sequence of points with decreasing function values that are all 
feasible for V. We can say a bit more about accumulation points of this sequence. Dehne 
V(y) as the convex problem of minimizing go(x,y) over x e M” and gj(x,y) < 0. 

Theorem 2: [Necessary] 

If x solves V(x) then x satisfies the hrst order necessary conditions for a local minimum of V. 

Proof: The necessary and sufficient conditions for a; to be a minimum of V(y) are that there 
exists A > 0 such that 

m 

vMx, y) + J2 \ v i9j( x , y) = o, 

3 =1 

with in addition gj(x,y) < 0 and A jgj(x,y) = 0. 

The hrst order necessary conditions for a: to be a local minimum of V are that there exists 
A > 0 such that 

m 

^/o(a:) + EW(2) = 0 ) 

3=1 

with in addition fj(x) < 0 and A jfj(x) = 0. 

But if the fj and gj are differentiable, we know from majorization theory (De Leeuw (2016), 
Lange (2016)) that Vfj(x) = gj(x,x) for all x. 

QED 
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It follows that if we define T{y) as the solution of V{y) then fixed points of T are local 
minimizers of V. Thus if converges to a fixed point of J r , it converges to a local minimizer 
of V, and /o(at fc ') converges to a local minimum. 

3 MDS with Distance Bounds 

There are two types of constraints. The first are y/x'A k x < j3 k , which we transform simply to 

x'A k x < Pi (1) 

and otherwise leave undisturbed. No majorization is required. 

The second set of constraints is \Jx'A k x > a k or a k — a Jx' A k x < 0. We majorize at x^ k ~ x \ 
using Cauchy-Schwarz, to 


a k - \ x'A k x (k 1} < 0 (2) 

d k {x^ T) 

Thus the QPQC problem we have to solve in each bounded smacof iteration, a.k.a. problem 
V k , is minimization of the quadratic 1 — x'Y{x^ x ^) _|_ ^x'x over x satisfying the quadratic 
constraints (1) and the linear constraints (2). 

4 Software 

Minimizing a convex quadratic under convex quadratic constraints has been implemented in 
the R function qpqc, described in De Leeuw (2017c). It dualizes the problem and then uses 
the nnnewtin function for Newton’s method with non-negativity constraints. We apply qpqc 
to solve our problems V k , using the fact that the linear constraints (2) are just quadratic 
constraints with a zero quadratic component. 

The smacofUpDown function has two arguments without default values, a vector delta of 
length ^n(n — 1) with dissimilarities (lower-diagonal, columnwise, same ordering as a dist 
object) and an n X p matrix xini with an initial configuration. 

args (smacofUpDown) 

## function (delta, xini, w = rep(l, length(delta)), bndlw = rep(0, 

## length(delta)), bndup = rep(Inf, length(delta)), itmax = 1000, 

## eps = le-10, verbose = FALSE) 

## NULL 

The arguments bndlw and bndup are vectors of length \n(n — 1), with default values 0 and 
Too. Thus, by default, smacofUpDown computes an unconstrained weighted least squares 
multidimensional scaling solution, where the weights in w are by default all equal to Tl. 
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There is no default value for xini, and the user must make sure that the distances of the 
initial configuration are between the bounds (are feasible). This is easy enough to do if 
there are only lower bounds (take any configuration and make it big enough) or only uppper 
bounds (make it small enough). If there are both upper and lower bounds then feasibility 
should be checked, because smacofUpDown will refuse to start if the initial configuration has 
any infeasible distances. 


5 Examples 


5.1 Equidistances 


Our first example has n = 10 with all 5ij equal. The optimal smacof solution in two 
dimensions needs 131 iterations to arrive at stress 0.1098799783. We reanalyze the data with 
the constraint that the distance between the first point and all nine others is at least one. 
The algorithm incorporating these bounds, implemented in the function smacof Above, uses 
156 iterations and finds stress 0.1340105193. The two configurations are in figure 1. There 
are five active constraints, i.e. distances equal to one, indicated by lines in the plot. 
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Figure 1: Equidistance Data, Unconstrained Left, Lower Bounds Right 

In the second analysis, now using eight objects with equal dissimilarties, we require dij(X) > 1 
for all i,j with \i — j\ = 1, and dij(X) < 2 for all i, j with \i — j\ =2. 

In the unconstrained case we find stress equal to 0.0973520592 after 132 iterations, in the 
constrained case we find 0.1183956861 after 357 iterations. The lower bound constraints are 
all active, which means the distances between successive points are all one (indicated with 
lines in figure 2. The upper bound constraints arc all inactive. 
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Figure 2: Equidistance Data, Unconstrained Left, Lower Bounds Right 

So far we have assumed o tJ < for all i,j. It is possible, however, to let a i3 = f3ij, which 
means we require dp-(A") = = f3ij. Note the complete set of constraints still needs to be 

consistent, and the initial configuration must still be feasible. In our example we require all 
seven distances between successive points to be equal to one. 

Our algorithm in this case becomes extremely slow. It stops at the upper bound of 2000 
iterations, with stress 0.145422896. Of course all constraints are now active. The Lagrange 
multipliers are large, serving as penalty parameters to force the equality constraints. 


## 

[1] 

## 

[7] 

## 

[13] 
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342.484954 

685.017712 


798.640263 
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Figure 3: Equidistance Data, Distances Constrained to One 

Note, however, that the previous analysis where we merly required d t3 (X ) > 1 for \i — j\ — 1 
produced a solution feasible for the current problem with stress equal to 0.1183956861. 
This makes it interesting see what goes on if we increase the upper bound of the number of 
iterations even more, say, to 10000. 

We now find stress 0.1016650356 after 4638 iterations. The Lagrange multipliers for this 
solution are much smaller (first, seven correspond with upper bounds, second seven with 
lower bounds). What basically happens is that for each (i,j) with \i — j\ = 1 either the 
upper or the lower constraint is seen to be violated and gets a zero Lagrange multiplier. The 
corresponding Lagrange multiplier for the corresponding other constraint is positive, because 
it is interpreted as being satisfied. This is a consequence of using floating point with limited 
precision, and the difference between cx %3 = 0.99 ■ ■ • and & %3 = 1.00 

## [1] 0.1769885825 0.0000000000 0.2256740368 0.0000000001 0.0000005800 

## [6] 0.0000000012 0.0707712910 

## [1] 0.0000094723 0.2413717195 0.0000020656 0.0233523251 0.1287322429 

## [6] 0.1247717131 0.0000000002 


The successive distances are 

## [1] 0.9999999397 1.0000000032 1.0000000368 1.0000000221 0.9999999739 
## [6] 0.9999999647 0.9999999492 




Figure 4: Equidistance Data, Distances Constrained to One 

5.2 Dutch Political Parties 1967 

As the next illustration we use data from De Gruijter (1967), with average dissimilarity 
judgments between Dutch political parties in 1967. The data are 

## KVP PvdA VVD ARP CHU CPN PSP BP 

## PvdA 5.63 

## VVD 5.27 6.72 

## ARP 4.60 5.64 5.46 

## CHU 4.80 6.22 4.97 3.20 

## CPN 7.54 5.12 8.13 7.84 7.80 

## PSP 6.73 4.59 7.55 6.73 7.08 4.08 

## BP 7.18 7.22 6.90 7.28 6.96 6.34 6.88 

## D66 6.17 5.47 4.67 6.13 6.04 7.42 6.36 7.36 

First, three different but comparable anlyses are done. The first does not impose restriction, 
the second is MDS from below, which requires dij(X) < Sij for all i, j. And the third is MDS 
from above, for which dij(X ) > 5ij for all i,j. The configurations are quite similar, except for 
the position of D’66, which at the time was a novelty in Dutch politics. The value of stress 
at the solutions is, respectively, 0.044603386, 0.0752702106, and 0.280130691. 
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Figure 5: De Gruijter Data, Unconstrained Solution 
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Figure 6: De Gruijter Data, MDS from Below 
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Figure 7: De Gruijter Data, MDS from Above 

The same data are used in the next analysis, which requires 2 < dij(X ) < 8 for all i, j. We 
converge to stress 0.0668519523 in 119 iterations. There are eight active constrains, with 
four distances equal to the lower bound and four equal to the upper bound. The configuration 
in figure 8 shows the distances equal to the lower bound in blue and those equal to the upper 
bound in red. The active constraints are also clearly visible in the Shepard plot in figure 8. 



dim 1 



Figure 8: De Gruijter Data, Distance Intervals 
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6 Appendix: Code 

6.1 updown.R 


suppressPackageStartupMessages 

suppressPackageStartupMessages 

suppressPackageStartupMessages 

suppressPackageStartupMessages 

source ( " auxilary. R" ) 

source ( "mdsUtils. R" ) 

source ( "nnnewton. R" ) 

source ( "qpqc. R" ) 

source ( " smacofUpDown.R" ) 


(library (mgcv, quietly = TRUE)) 
(library (MASS, quietly = TRUE)) 
(library (lsei, quietly = TRUE)) 
(library (numDeriv, quietly = TRUE)) 


6.2 auxilary.R 

mprint <- function (x, 

d = 6, 
w = 8, 
f = "") { 

print (noquote (formate ( 
x, 

di = d, 
wi = w, 
fo = "f", 
flag = f 
))) 

> 

directSum <- function (x) { 
m <- length (x) 
nr <- sum (sapply (x, nrow)) 
nc <- sum (sapply (x, ncol)) 
z <- matrix (0, nr, nc) 
kr <- 0 
kc <- 0 

for (i in l:m) { 
ir <- nrow (x[ [i] ] ) 
ic <- ncol (x[ [i] ]) 

z [kr + (l:ir), kc + ( 1: ic)] <- x [[i] ] 
kr <- kr + ir 
kc <- kc + ic 

} 

return (z) 
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> 


repList <- function(x, n) { 
z <- list() 
for (i in l:n) 

z <- c(z, list(x)) 
return(z) 


shapeMe <- function (x) { 
m <- length (x) 

n <- (1 + sqrt (1 + 8 * m)) / 2 
d <- matrix (0, n, n) 
k <- 1 

for (i in 2:n) { 

for (j in 1: (i - 1)) { 

d[i, j] <- d[j, i] <- x [k] 
k <- k + 1 

> 

} 

return (d) 


symmetricFromTriangle <- function (x, lower = TRUE, diagonal = TRUE) { 
k <- length (x) 
if (diagonal) 

n <- (sqrt (1+8* k) -1) / 2 
else 

n <- (sqrt (1+8* k) +1) / 2 
if (n != as.integer (n)) 
stop ("input error") 
nn <- 1: n 

if (diagonal && lower) 
m <- outer (nn, nn, ">=") 
if (diagonal && (! lower)) 
m <- outer (nn, nn, "<=") 
if ((Idiagonal) && lower) 
m <- outer (nn, nn, ">") 
if ((Idiagonal) && (! lower)) 
m <- outer (nn, nn, "<") 
b <- matrix (0, n, n) 
b[m] <- x 
b <- b + t(b) 
if (diagonal) 
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diag (b) <- diag(b) / 2 
return (b) 

> 

triangleFromSymmetric <- function (x, lower = TRUE, diagonal = TRUE) { 
n <- ncol (x) 
nn <- 1: n 

if (diagonal && lower) 
m <- outer (nn, nn, ">=") 
if (diagonal && (! lower)) 
m <- outer (nn, nn, "<=") 
if ((Idiagonal) && lower) 
m <- outer (nn, nn, ">") 
if ((Idiagonal) && (! lower)) 
m <- outer (nn, nn, "<") 
return (x Dm]) 


6.3 mdsUtils.R 

library (mgcv) 

torgerson <- function (delta, p = 2) { 

z <- slanczos(-doubleCenter((delta 2) / 2) , p) 
w <- matrix (0, p, p) 
v <- pmax(z$values, 0) 
diag (w) <- sqrt (v) 
return (z$vectors 1*1 w) 


basisPrep <- function (n, p, w) { 
m <- n * (n - 1) / 2 

v <- -symmetricFromTriangle (w, diagonal = FALSE) 

diag (v) <- -rowSums(v) 

ev <- eigen (v) 

eval <- ev$values[1:(n - 1)] 

evec <- ev$vectors[, 1: (n - 1)] 

z <- evec 1*1 diag (1 / sqrt (eval)) 

a <- array (0, c(n - 1, n - 1, m)) 

k <- 1 

for (j in 1: (n-1) ) { 
for (i in (j+1):n) { 
dif <- z [i,] - z[j ,] 
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a [, , k] <- outer (dif, dif) 
k <- k + 1 

> 

} 

return (list (z = z, a = a)) 

> 

center <- function (x) { 

return (apply (x, 2, function (z) z - mean (z))) 

> 

doubleCenter <- function (x) { 
n <- nrow (x) 
j <- diag(n) - (1 / n) 
return (j 1*1 x 1*1 j) 

} 

squareDist <- function (x) { 
d <- diag (x) 

return (outer (d, d, "+") - 2 * x) 


6.4 smacofUpDown.R 

smacofUpDown <- 
function (delta, 
xini, 

w = rep (1, length (delta)), 
bndlw = rep (0, length (delta)), 
bndup = rep (Inf, length (delta)), 
itmax = 1000, 
eps = le-10, 
verbose = FALSE) { 
m <- length (delta) 
p <- dim (xini)[2] 
n <- (1 + sqrt (1 + 8 * m)) / 2 
dini <- as.vector (dist (xini)) 
if (any (dini > bndup) I I any (dini < bndlw)) 
stop ("initial estimate not feasible") 
wndup <- which (bndup < Inf) 
lup <- length (wndup) 
wndlw <- which (bndlw > 0) 
llw <- length (wndlw) 
ltt <- lup + llw 
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r <- p * (n - 1) 
yold <- rep (0, ltt) 
h <- basisPrep (n, p, w) 
xold <- rep (0, r) 
for (s in 1: p) { 

k <- (s - 1) * (n - 1) + 1: (n - 1) 
xold[k] <- 

crossprod (h$z, xini[, s]) / diag (crossprod (h$z)) 

> 

d <- rep (0, m) 
itel <- 1 
sold <- Inf 

ssq <- sum (w * delta ~ 2) 
repeat { 

xmid <- rep (0, r) 

xx <- matrix (xold, n - 1, p) 

t <- 1 

bcomp <- matrix (0, r, ltt + 1) 
for (k in l:m) { 
ak <- h$a[, , k] 
ax <- ak °/ 0 *% xx 
d[k] <- sqrt (sum (xx * ax)) 
if (lis.na (match (k, wndlw))) { 
bcomp[, 1 + lup + t] <- -ax / d[k] 
t <- t + 1 

} 

xmid <- xmid + w[k] * (delta[k] / d[k]) * ax 

> 

if (ltt > 0) { 

ccomp <- c(c(ssq, -bndup[wndup] 2) / 2, bndlw[wndlw]) 

bcomp [, 1] <- -xmid 

acomp <- array (0, c(r, r, ltt + 1)) 

acomp[, , 1] <- diag (r) 

t <- 1 

for (k in wndup) { 

ak <- directSum (repList (h$a[, , k], p)) 
acomp[, , t + 1] <- ak 
t <- t + 1 

} 

v <- qpqc (yold, acomp, bcomp, ccomp, verbose = FALSE) 

snew <- 2 * v$f / ssq 

xnew <- v$xmin 

ynew <- v$multipliers 

cons <- v$constraints 
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} 

else { 

xnew <- xmid 

snew <- sum (w * (delta - d) ~ 2) / ssq 
ynew <- NULL 
cons <- NULL 

} 

if (verbose) 
cat ( 

"Iteration: ", 

formate (itel, width = 3, format = "d"), 
"sold: ", 
formate ( 
sold, 

digits = 8, 
width = 12, 
format = "f" 

), 

"snew: ", 
formate ( 
snew, 

digits = 8, 
width = 12, 
format = "f" 

), 

"\n" 

) 

if ((itel == itmax) II ((sold - snew) < eps)) 
break 

xold <- xnew 
sold <- snew 
yold <- ynew 
itel <- itel + 1 

> 

xconf <- matrix (0, n, p) 
for (s in l:p) { 

k <- (s - 1) * (n - 1) + 1: (n - 1) 
xconf[, s] <- h$z xnew[k] 

> 

return ( 
list ( 

delta = delta, 
dist = d, 
x = xconf. 
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multipliers = ynew, 
constraints = cons, 
itel = itel, 

stress = sum (w * (delta - d) * 2) / ssq 

) 

) 

} 
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